Standardizing Protein Corona Characterization in Nanomedicine: A Multicenter Study to Enhance Reproducibility and Data Homogeneity

We recently revealed significant variability in protein corona characterization across various proteomics facilities, indicating that data sets are not comparable between independent studies. This heterogeneity mainly arises from differences in sample preparation protocols, mass spectrometry workflows, and raw data processing. To address this issue, we developed standardized protocols and unified sample preparation workflows, distributing uniform protein corona digests to several top-performing proteomics centers from our previous study. We also examined the influence of using similar mass spectrometry instruments on data homogeneity and standardized database search parameters and data processing workflows. Our findings reveal a remarkable stepwise improvement in protein corona data uniformity, increasing overlaps in protein identification from 11% to 40% across facilities using similar instruments and through a uniform database search. We identify the key parameters behind data heterogeneity and provide recommendations for designing experiments. Our findings should significantly advance the robustness of protein corona analysis for diagnostic and therapeutics applications.

−4 This biological layer transforms nanoparticles by endowing them with a new identity that influences how they are recognized and interact within biological systems, thus defining their safety, biodistribution, diagnostic and therapeutic efficacy. 3,5,6espite significant advances in nanomedicine, a limited understanding of nanoparticles' biological identity remains a major barrier to their successful clinical translation. 7A profound comprehension of the protein corona composition is crucial for predicting their in vivo biological fate, which is essential for their clinical application and the advancement of future nanomedicine technologies. 7ver the past decade, extensive research has aimed to enhance the reproducibility of nanobio interfaces. 8,9−13 Our recent studies show significant discrepancies in identical protein corona outcomes and interpretations across various proteomics facilities, attributed to differences in sample preparation protocols, LC-MS workflows, instrumenta-tion, and data processing. 13Consequently, there is an urgent need to develop a standardized protocol for protein corona analysis to reduce data heterogeneity and facilitate comparability across studies. 13,14ur previous research indicates that major disparities in the characterization of the protein corona composition are primarily due to variations in sample preparation protocols, LC-MS workflows, and data processing. 13We also revealed that harmonizing database search and data processing can significantly reduce the observed heterogeneity among core facilities. 15Different proteomics core facilities employ a range of sample preparation methods, instrumentation, quantification approaches, search parameters, and data processing techniques.These differences can significantly bias the outcomes of the protein corona analyses of identical samples.−21 The primary goal of this follow-up study is to minimize the main sources of variations and improve the protein corona data homogeneity, focusing on harmonizing sample preparation protocols, LC-MS instrumentation and workflows, database searches, and processing strategies.Ultimately, similar to other nanomedicine techniques and methods, 6 we aim to establish a standardized protocol that will streamline the analysis of the protein corona across different core facilities, facilitating the comparability of protein corona data sets from various studies.More specifically, here, we prepared identical ready-to-inject batches of digested protein corona samples and sent them to various proteomics centers for analysis.We used an identical on-bead digestion protocol for all protein corona samples and submitted the final 4 (out of 6) identical batches of dried peptides to 4 proteomics core facilities in the United States that had provided relatively higher quality data in on our previous study (called good centers) based on protein and peptide counts, coefficients of variation (CV) of technical replicates, and the median sequence coverage. 13In addition, to probe the role of LC-MS instrument in heterogeneity of protein corona data, we also sent another 4 identical batches of dried peptides to 4 proteomics core facilities that had similar instrumentation and were expected to perform similarly (i.e., identical LC and similar MS types; called similar centers).Two of these centers were shared with the above (good) centers, so in total we sent identical samples to 6 core facilities.Finally, we also performed a uniform database search on the raw data retrieved from the 4 centers with similar instruments, where the sample preparation protocol, instrumentation, and data processing would all be streamlined.In summary, we conducted the study in 3 steps: 1) unifying the sample preparation protocol; 2) unifying sample preparation protocol and the instrumentation used, and 3) unifying sample preparation protocol, instrumentation, and data processing.We show that implementation of each step reduces data heterogeneity of the protein corona composition across core facilities, which can be used for analysis of protein corona in the future, enabling rapid developments in nanomedicinebased diagnostics and therapeutics.

■ RESULTS
The overall workflow of this study is illustrated in Figure 1.We employed commercially available plain polystyrene nanoparticles with an average diameter of 78.8 nm. 13 Consistent with our previously published protocols, all protein coronacoated nanoparticles were prepared under identical conditions (refer to reference 13 for details).Supplementary Figure 1 illustrates the dynamic light scattering (DLS), zeta potential, and transmission electron microscopy (TEM) analyses of both bare and protein corona-coated nanoparticles.Notably, our prior research indicated minimal batch-to-batch variation in the physicochemical properties of the protein corona-coated nanoparticles.Bare nanoparticles were monodispersed with a narrow size distribution, measuring an average size of 78.8 ± 0.0 nm and a surface charge of −30.8 ± 0.8 mV.After exposure to human plasma, the average size increased to 111.3 ± 9.6 nm, and the surface charge changed to −10.2 ± 0.4 mV, confirming the formation of the protein corona.−24 The protein concentration in each batch, approximately 1.6 μg, was quantified using a bicinchoninic acid assay (BCA) to ensure proper sample preparation for LC-MS analysis.These fully characterized samples were subsequently digested using a standardized protocol as described in the experimental section and dispatched to various proteomics facilities for analysis.The Impact of Unifying Sample Preparation Protocol and Instrumentation.We submitted 4 identical batches of dried peptides to 4 core facilities that had better performance in terms of protein counts, peptide counts, coefficients of variation (CV) of technical replicates, as well as the median sequence coverage based on our previous findings 13 (see Supporting Information for details regarding each core facility and the associated protocols).Furthermore, 4 identical batches were submitted to the 4 proteomics core facilities in the USA with similar instrumentation (with regards to LC and MS types). 13These centers used the same LC system (Dionex Ultimate 3000) and MS systems that should largely produce similar results (two used Fusion Lumos, one Fusion system, and one HF-X system).All centers were asked to analyze the samples over a 120 min gradient using label-free quantification (LFQ).Since good performance in our original study was also taken into account, 13 we could not identify 4 core facilities with exact same MS instruments and had to select these centers for analysis.The LC type and MS instruments, as well as good/similar designations, are summarized in Table 1.
We consolidated the data from 6 core facilities as detailed in Supplementary Data 1.As depicted in Figure 2a-b, even when selecting core facilities that outperformed others based on previously mentioned criteria, the overlap of quantified proteins in identical samples across different centers was only about 11%.In contrast, for core facilities using similar instrumentation, the overlap was 18%.When considering proteins quantified consistently across all three replicates, the overlap percentages were 8% and 14% for good and similar centers, respectively.These findings highlight the impact of instrumentation on the variability of the protein corona analysis outcomes.Moreover, protein intensities between different core facilities did not exhibit a consistent pattern, regardless of whether the centers were categorized as "good" and/or "similar" (Figure 2c).A principal component analysis of the proteomics data collected from the centers is shown in Figure 2d.
The hierarchical clustering of the normalized protein intensities in Figure 3a-b shows data completeness captured by each core facility for good and similar centers, respectively.The upset plots shown in Figure 3c-d demonstrate the uniqueness of the data obtained from the different core facilities for good and similar centers, respectively.These plots were made with all the proteins quantified across all the cores and demonstrate that the difference (uniqueness) of the data from each core still outweighs their similarities, despite streamlining sample preparation and the instrumentation used.Note that similarities are higher for centers using similar instruments.
The Impact of Unifying Sample Preparation Protocol, Instrumentation, Database Search and Data Processing.As a final step, we subjected the raw data from the 4 centers with similar instrumentation to a unified database search.We have already shown that on top of the sample preparation protocol and the instruments used during sample analysis, the database search adds a new layer of variability.These variabilities can be introduced by using different search settings such as false-discovery rates (FDR), inclusion of different fixed and variable post-translational modifications, the number of missed cleavages, and the sequence database used in the search.
We included carbamidomethylation as a variable modification, since some centers had not specified if the reduction and alkylation of proteins had been performed.Carbamidomethylated peptides were used in the quantification of proteins, as well.As for other variable modifications, we included only the default methionine oxidation and acetylation of protein Ntermini.Similar to previous studies by us 15 and others, 25 we included a 1% FDR at both the protein and peptide levels.We only searched for specific tryptic peptides and allowed up to two missed cleavages, which is a routine procedure.We only applied the parameters that are well-accepted in the community; 26 however, it should be noted that this uniform search does not undermine the validity of the previous database searches performed by the core facilities individually.
In the uniform search, we quantified 370 proteins (Supplementary Data 2), which is significantly lower than the combined number of proteins (n = 1824) individually reported by the centers.A centralized FDR control and the application of the identical database search and software is expected to reduce the number of quantified proteins in the aggregated search, as we have also reported. 15s shown in Figure 4a, the application of the uniform search substantially improved the number and percentage of the shared proteins in the data retrieved from the 4 core facilities using similar instrumentation.However, due to the reduction of the number of quantified proteins in the uniform database search, the percentage of shared proteins is a more accurate way of comparing performance between the individual searches and the uniform search than the number of quantified proteins.For proteins with no missing values across centers using similar instruments, the percentage of shared proteins increased from 18% in the individual searches to 40% in the uniform search.These results are consistent with our previous study, where using a uniform database search, we found an overlap of 35.3% among the top 5 facilities (among 15 centers). 15This is a remarkable increase, given that the MS instruments were not exactly the same and were chosen based on their expected similar performance.The hierarchical clustering in Figure 4b also shows that the data completeness across different centers was dramatically improved (compared with Figure 3b).Overall, these results show that the database search has a higher impact on the data homogeneity than the other parameters that were investigated in this study.

■ DISCUSSION
The protein corona spontaneously forms and evolves around nanoparticles when exposed to biological tissues and fluids.Recognizing its critical role in influencing the efficacy and safety of nanotechnologies and nanomedicines, extensive research has been dedicated to characterizing the protein corona composition in terms of protein identity and abundance. 14Despite numerous studies documented in the literature, efforts to reconcile data from independent studies and consolidate protein corona data sets for predicting nanoparticle' pharmacokinetics and biological fates are still limited. 27While significant progress has been made in standardizing essential characterization methods for nanomedicines to ensure reproducibility and robustness across different research centers, 28 a standardized protocol for analyzing the protein corona composition remains lacking.This gap highlights a critical need in the field of nanomedicine, as the protein corona plays a pivotal role in determining the biological identity and behavior of nanoparticles. 6S-based proteomics is the preferred method for characterizing the protein corona.While LC-MS typically offers robust and reproducible data on cell and tissue samples within the consistent experimental framework, 29 its application to protein corona and plasma-related samples often encounters challenges that restrict proteome coverage across different studies.A significant challenge is the broad dynamic range of protein concentrations in plasma.For example, albumin alone constitutes approximately 55% of the total protein mass in plasma.This dominance by a few high-abundance proteins can mask the presence of lower-abundance proteins, which are crucial for comprehensive proteomic analysis and accurate characterization of the protein corona. 30In fact, seven most abundant plasma proteins comprise 85% of total protein in plasma, 30 and upon digestion of plasma proteome, peptides from such abundant proteins crowd the mass spectra, making it challenging to quantify the other present proteins with lower abundance.This issue has been partially mitigated in the past by several depletion strategies that exploit immunodepletion spin columns, immunodepletion-LC, magnetic beads, and even nanoparticles themselves. 10,31,32Such strategies are used to deplete albumin and other abundant proteins before sample analysis.We have recently also introduced a novel methodology where spiking a fine-tuned concentration of phosphatidylcholine and a single nanoparticle were shown to deplete the 4 most abundant proteins in plasma, reducing their cumulative representation (MS intensities) from 90% to under 17% in the whole plasma 33 and enhancing the plasma proteome coverage by 446% (from 322 to 1436 plasma proteins).Despite these challenges, thousands of proteins have been reliably quantified Figure 2. Data overlaps among core facilities reporting better results (good) or having similar instrumentation (similar).a, The overlap in the quantified proteins among good or similar centers with regards to protein count and percentage.b, The overlap in the quantified proteins among good or similar centers with regards to protein count and percentage, limited to proteins quantified in all replicates.c, Distribution of protein-level intensities for the 6 cores (center line, median; box limits contain 50%; upper and lower quartiles, 75 and 25%; maximum, greatest value excluding outliers; minimum, least value excluding outliers; outliers, more than 1.5 times upper and lower quartiles).d, PCA analysis for the proteomics data collected from the centers.−36 Altogether, variations in proteome coverage introduce bias in data interpretation by neglecting low-abundance proteins that could otherwise be genuine targets or disease biomarkers.The need for comprehensive coverage of the proteome is felt even more in the analysis of nanoparticle protein corona, since the presence and absence of a biomarker are even more important.
Previously we showed that there are significant variations in the proteome coverage of identical protein corona samples across 17 core facilities.However, there was a good agreement in LC-MS data among the shared proteins across different core facilities, showing that one of the main challenges is achieving a high proteome coverage. 13We showed that these variations arise from using different sample preparation protocols, various LC-MS workflows and instruments, as well as database search parameters and data processing. 13In a follow-up study, we showed that implementing a uniform database search and data processing pipeline, can drastically contribute to data homogeneity (i.e., from 1.8% to 16.2% among top 11 facilities). 15 this study, we demonstrated that standardizing the sample preparation protocol, instrumentation, database search, and data processing significantly increases the percentage of shared proteins identified across different facilities in a stepwise manner.Our findings confirm that establishing standard protocols for LC-MS analysis of the protein corona can greatly enhance the consistency of proteomics data from various centers.Without such standardized protocols, it is challenging, if not impossible, to expect different core facilities to adopt identical settings for each step of the process.This difficulty arises from the diversity of available analytical tools, such as various types of analytical columns and database search engines.Hypothetically, homogeneity can be further improved by unifying all the instrumental settings, for example, including the duration of the gradient, fragmentation techniques, number of scans, resolution, and so on.Standardizing these elements is crucial for improving reproducibility and reliability in nanoparticle protein corona research, but will remain a formidable challenge.
In summary, the results of this study demonstrated the significant impact of unified protocols on reducing heterogeneity in identical protein corona characterization across different proteomics facilities.By implementing standardized sample preparation, using consistent instrumentation and harmonizing data processing parameters, we achieved a significant improvement in the homogeneity of the final protein corona outcomes.This highlights the critical role of standardized practices in enhancing data reproducibility and emphasizes the potential barriers in their implementation due to the diverse capabilities and resources of different laboratories.Despite these challenges, our approach illustrates the possibility of significant improvements in data quality and consistency, paving the way for achieving more reliable protein corona analysis.Within a given experiment, any of the core facilities can be used for monitoring the dynamic structure of the protein corona.The problem arises when results from multiple experiments are analyzed simultaneously, where the missing values become a problem.We recommend using a core facility that provides the highest proteome and sequence coverage (fewer missing values) and a decent CV.We further recommend designing the comparative experiments in one batch, or at least using the same protocols, workflows, and instruments when designing multibatch experiments.
Moving forward, the establishment of universal standards for the analysis of the protein corona will be crucial for the advancement of nanomedicine, ensuring that findings from different studies are comparable and that the biological implications of nanoparticle−protein interactions are understood with greater clarity.The adoption of such standards promises to bridge the gap between nanoparticle research and clinical applications, ultimately enhancing the development of nanomedicine-based diagnostics and therapeutics.It is noteworthy that this standardized protocol can be applied to both soft and hard coronas, regardless of the collection methods used.The critical factor is ensuring that the methods employed prevent any protein contamination in the protein corona shell, as contamination can lead to significant errors in data interpretation. 37

Figure 1 .
Figure1.Schematic showing overall workflow of the study.After formation of 6 similar batches of protein corona−coated polystyrene nanoparticles, each individual batch was fully digested to peptides using our in-house developed protocol, mixed, aliquoted, and dried.The resulting identical aliquots were shipped to 6 different proteomics core facilities across the USA (one state includes two proteomics center) to investigate the homogeneity/heterogeneity of the protein corona composition on the surface of the nanoparticles.Out of these 6 core facilities, 2 were designated as both good and similar.The cryo-TEM image of the corona-coated nanoparticle is reproduced here with permission from reference 13.

Figure 3 .
Figure 3. Uniqueness of the data obtained from the different core facilities.a-b, Hierarchical clustering of quantified proteins across the 4 good core facilities and 4 core facilities with similar instrumentation, respectively.c-d, Upset plot showing the variabilities in the number of detected proteins across the 4 good core facilities and 4 core facilities with similar instrumentation, respectively.All analyses were based on three technical replicates.

Figure 4 .
Figure 4.A uniform database search dramatically improves data homogeneity across different core facilities.a, The number and percentage of shared proteins with or without missing values in the uniform database search of LC-MS/MS data of the 4 centers using similar instruments, respectively.b, Proteins detected in the uniform database search of LC-MS/MS data of the 4 centers using similar instruments.All analyses were based on three technical replicates.

Table 1 .
Specific LC and MS Systems Employed by Each of the Six Core Facilities Involved in Our Study